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Abstract 

In this paper, we reduce Prize-Collecting Steiner TSP (PCTSP), Prize-Collecting Stroll (PCS), 
Prize-Collecting Steiner Tree (PCST), Prize-Collecting Steiner Forest (PCSF) and more gener- 
ally Submodular Prize-Collecting Steiner Forest (SPCSF) on planar graphs (and more generally 
bounded-genus graphs) to the same problems on graphs of bounded treewidth. More precisely, 
we show any a-approximation algorithm for these problems on graphs of bounded treewidth 
gives an (a + e)-approximation algorithm for these problems on planar graphs (and more gen- 
erally bounded-genus graphs), for any constant e > 0. Since PCS, PCTSP, and PCST can be 
solved exactly on graphs of bounded treewidth using dynamic programming, we obtain PTASs 
for these problems on planar graphs and bounded-genus graphs. In contrast, we show PCSF 
is APX-hard to approximate on series-parallel graphs, which are planar graphs of treewidth 
at most 2. This result is interesting on its own because it gives the first provable hardness 
separation between prize-collecting and non-prize-collecting (regular) versions of the problems: 
regular Steiner Forest is known to be polynomially solvable on series-parallel graphs and admits 
a PTAS on graphs of bounded treewidth. An analogous hardness result can be shown for Eu- 
clidian PCSF. This ends the common belief that prize-collecting variants should not add any 
new hardness to the problems. 
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1 Introduction 



Prize-collecting problems involve situations where there are various demands that desire to be 
"served" by some structure and we must find the structure of lowest cost to accomplish this. How- 
ever, if some of the demands are too expensive to serve, then we can refuse to serve them and 
instead pay a penalty. In particular, prize-collecting Steiner problems are well-known network 
design problems with several applications in expanding telecommunications networks (see for ex- 
ample [46, 52]), cost sharing, and Lagrangian relaxation techniques (see e.g. [45, 21]). A general 
form of these problems is the Prize-Collecting Steiner Forest (PCSF) problem 1 : given a network 
(graph) G = (V, E), a set of source-sink pairs 2 V = {{s\, ii }, {s2, £2}, ■ • - , {sk, tk}}, a non-negative 
cost function c : E — > M + , and a non- negative penalty function 7r : 2 D — > M + , our goal is a 
minimum-cost way of installing (buying) a set of links (edges) and paying the penalty for those 
pairs which are not connected via installed links. We also consider the problem with a general 
penalty function called Submodular Prize-Collecting Steiner Forest (SPCSF), in which the penalty 
function it is a monotone non-negative submodular function 3 of all unsatisfied pairs. In PCSF when 
all penalties are 00, the problem is the classic APX-hard Steiner Forest problem, for which the 
best known approximation ratio is 2 — - (n is the number of nodes of the graph) due to Agrawal, 
Klein, and Ravi [2] (see also [35] for a more general result and a simpler analysis). The case of 
Prize-Collecting Steiner Forest problem in which all sinks are identical is the classic (rooted) Prize- 
Collecting Steiner Tree (PCST) problem. In the unrooted version of this problem, there is no specific 
sink (root) and the goal is to find a tree connecting some sources and pay the penalty for the rest 
of them. We also study two variants of (unrooted) Prize-Collecting Steiner Tree, Prize-collecting TSP 
(PCTSP) and Prize-collecting Stroll (PCS), in which the set of edges should form a cycle and a 
path (in order) instead of a tree. When in addition all penalties are 00 in these prize-collecting 
problems, we have classic APX-hard problems Steiner Tree, TSP and Stroll (Path TSP) for which 
the best approximation factors in order are 1.38 [16], | [20], and | [42]. 

In network design, planarity is a natural restriction since in practical scenarios of physical 
networking, with cable or fiber embedded in the ground, crossings are rare or nonexistent. Thus 
obtaining algorithms with better approximation factors are highly desirable in this case. In many 
cases, approximation algorithms for planar graphs is based on reducing the problem to bounded 
treewidth instances such that the optimum changes only by a small term. This idea goes back to 
the classical work of Baker [9] and have been applied successfully several times in various contexts. 
The algorithmic and graph-theoretic properties of treewidth are intensively studied and a well- 
understood dynamic programming technique can solve NP-hard problems on bounded treewidth 
graphs. Our goal is to understand how far this paradigm can be pushed: what are the most general 
problems that can be solved this way. In particular, we want to understand the applicability of 
this technique to prize-collecting variants of standard optimization problems. 

TSP, Steiner Tree, and Steiner Forest all have been considered extensively on planar graphs. 
Indeed all these problems remain hard even on planar graphs [29]. However obtaining a PTAS 

1 In the literature, this problem is also called Prize-Collecting Generalized Steiner Tree. 

2 Source-sink pairs are sometimes called demands. 

3 A function / : 2 s H> E is called submodular if and only if MA, SCS: f(A) + f(B) > f(A U B) + f(A n B). An 
equivalent characterization is that the marginal profit of each item should be non-increasing, i.e., f(Au{a}) — f(A) < 
f(B U {a}) - f(B) if B C A C S and a g S \ B. A function / : 2 s i-> H is monotone if and only if f(A) < f(B) for 
A C B C S. Since the number of sets is exponential, we assume a value oracle access to the submodular function; 
i.e., for a given set T, an algorithm can query an oracle to find its value f(T). 
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for each of these problems remained a very important open problem for several years. Grigni, 
Koutsoupias, and Papadimitriou [36] obtained the first PTAS for TSP on unweighted planar graphs 
in 1995 which later has been generalized to weighted planar graphs [6] (and improved to linear time 
[47]). Obtaining a PTAS for Steiner Tree on planar graphs remained elusive for almost 12 years 
until 2007 when Borradaile, Klein and Mathieu [15] obtained the first PTAS for Steiner Tree on 
planar graphs using a revolutionary technique of contraction decomposition and building spanners 
and posed obtaining a PTAS for Steiner Forest in planar graphs as the main open problem. Bateni, 
Hajiaghayi and Marx [12] very recently solved this open problem using a new primal-dual technique 
for building spanners and obtaining PTASs by reducing the problem to bounded treewidth graphs. 
Note that the Steiner Forest problem already shows signs of the reduction to bounded treewidth 
paradigm breaking down: surprisingly, Steiner Forest turns out to be NP-hard even on graphs of 
treewidth 3. However, [12] gets around this problem by using a PTAS on bounded treewidth graph 
instead of an exact algorithm. 

Obtaining PTASs for prize-collecting versions of these problems remained a main open problem 
(see [12, 11]). It is not obvious how to generalize the reduction to bounded treewidth for these 
problems, and in particular new techniques are needed for handling penalties before building a 
spanner. In this paper, we resolve these open problems for all three of PCST, PCTSP, PCSF, 
and even more generally, for SPCSF, by reducing these problems on planar graphs to the same 
problems on graphs of bounded treewidth. More precisely we show any a- approximation algorithm 
for these problems on graphs of bounded treewidth gives a (a + e)-approximation algorithm for 
these problems on planar graphs and bounded-genus graphs, for any constant e > 0. Therefore, 
we demonstrate that the technique of reduction to bounded treewidth works even for very general 
version of problems involving prizes. Since PCST and PCTSP can be solved exactly on graphs of 
bounded treewidth using standard dynamic programming techniques (as we discuss later in the 
paper), we immediately obtain PTASs for PCST and PCTSP on planar graphs (the same holds 
for PCS as well). In contrast, we show that PCSF is APX-hard already on series-parallel graphs, 
which are planar graphs of treewidth at most 2, ruling out any hope for a PTAS for planar PCSF. 
This result is interesting on its own, since it gives the first provable hardness separation between 
prize-collecting and non-prize-collecting (regular) versions of the problems: regular Steiner Forest 
is known to be polynomially solvable on series-parallel graphs and admits a PTAS on graphs of 
bounded treewidth. since Steiner Forest on series-parallel graphs is polynomially solvable and more 
generally on graphs of bounded treewidth admits a PTAS [12]. An analogous hardness result can 
be given for Euclidean PCSF when the vertices of the input graph are points in the Euclidean plane 
and the lengths are Euclidean distances (which answers an open problem in [11]). This ends the 
common belief that prize-collecting variants should not add any new hardness to the problems. 

Related work. PCST and PCTSP are two of the classic optimization problems with a large 
impact, both in theory and practice. At AT&T, PCST code has been used in large-scale studies 
in access network design, both as described in Johnson, Minkoff and Phillips [46], and another 
unpublished applied work by Archer at al. The impact of PCTSP within approximation algorithms 
is also far-reaching. In particular PCTSP is a Lagrangian relaxation of the fc-MST problem, which 
asks for the minimum-cost tree spanning at least k nodes, and has used in a sequence of papers 
([30, 8, 22, 7]) culminating in a 2-approximation algorithm for fc-MST by Garg [31]. PCTSP 
has also been used to improve the approximation ratio and running time of algorithms for the 
Minimum Latency problem ([5, 18]). The first approximation algorithms for the PCST and PCTSP 
problems were given by Bienstock et al. [13], although the PCTSP had been introduced earlier by 
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Balas [10]. Bienstock et al. achieved a factor of 3 for PCST and 2.5 for PCTSP by rounding the 
optimal solution to a linear programming (LP) relaxation. Later, Goemans and Williamson [34] 
constructed primal-dual algorithms using the same LP relaxation to obtain a 2-approximation for 
both problems, building on work of Agrawal, Klein and Ravi [2]. Chaudhuri et al. modified the 
Goemans- Williamson algorithm to achieve a 2-approximation algorithm for PCS [18]. Improving 
over the approximation factor 2 of Goemans and Williamson for PCST and PCTSP was a long- 
standing open problem for 17 years until recently that Archer, Bateni, Hajiaghayi, and Karloff [4] 
obtain constant factors strictly better than 2 (~ 1.99) for both problems, and for PCS as well. 
More recently Goemans combined some ideas of [4] with others from [32] to improve the ratio for 
PCTSP below 1.915 [33]. 

The general form of the Prize-Collecting Steiner Forest problem first has been formulated by 
Hajiaghayi and Jain [38]. They showed how by using a primal-dual method to a novel integer 
programming formulation of the problem with doubly-exponential variables, we can obtain a 3- 
approximation algorithm for the problem. In addition, they show that the factor 3 in the analysis 
of their algorithm is tight. However they show how a direct randomized LP-rounding algorithm with 
approximation factor 2.54 can be obtained for this problem. Their approach has been generalized 
by Sharma, Swamy, and Williamson [53] for network design problems where violated arbitrary 0-1 
connectivity constraints are allowed in exchange for a more general penalty function. Hajiaghayi 
and Nasri [40] show factor 3 for Prize-Collecting Steiner Forest can also be obtained via an iterative 
rounding approach, first introduced by Jain [44], and indeed factor 3 is the best one can hope 
via this approach. The work of Hajiaghayi and Jain has also motivated a game-theoretic version 
of the problem considered by Gupta et al. [37]. Very recently, Hajiaghayi et al. [39] obtain a 
2.54 approximation algorithm for the more general problem SPCSF. Aforementioned, our reduction 
from planar graphs to graphs of bounded treewidth works even for SPCSF. It is worth mentioning 
optimizing a submodular function, a discrete analog of a convex function, which also demonstrates 
economy of scale is a central and very general problem in combinatorial optimization and has 
been subject of a thorough study in the literature in many important settings including cuts in 
graphs [43, 35, 49], plant location problems [24, 23], rank function of matroids [26], set covering 
problems [27], and certain restricted satisfiability problems [41, 28]. 

Remark Subsequent to, and independent of, our work, Chekuri et al. [19] obtain a subset of our 
results including a reduction for prize-collecting Steiner tree and prize-collecting Steiner forest from 
planar graphs to graphs of bounded treewidth (i.e., a weaker version of our Theorem 1, albeit with 
different techniques) which leads to a PTAS for planar prize-collecting Steiner tree. The hardness 
results though are unique to our work. 

2 Contributions 

We first formally define the most general problem studied in this paper. An instance of Submodular 
Prize-Collecting Steiner Forest SPCSF is described by a triple (G,V,ir) where G is a undirected 
weighted graph, T> is a set of di = {sj, t{\ demand pairs, and n : 2 C i— > M + is a monotone nonnegative 
submodular penalty function. A demand d = {s,t} is satisfied by a subgraph F if and only if s,t 
are connected in F. If a forest F satisfies a subset T> S3t of the demands, its cost is defined as 
cost(-F) := length(F) + 7r(P unsat ), where length(F) is a shorthand for the total length of all edges 
in F, and X) unsat :=T>\ T> S3t denotes the subset of unsatisfied demands. 
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We similarly define SPCTSP, SPCS and SPCST that are submodular prize-collecting variants 
of Travelling Salesman Problem, Stroll and Steiner Tree, respectively. The instance is represented by 
(G, T>, ir) where all the demands d = {s, t} G T> share a common root vertex r G V(G). A A solution 
F is a TSP (stroll or Steiner tree, respectively) for a subset of demands, say V sat C V. The cost is 
then cost(F) := length(F) + 7r(P unsat ), where V unsat :=V\ V sat . 

We first show that Submodular Prize-Collecting Steiner Forest on planar graphs (or more gen- 
erally, bounded-genus graphs) is almost equivalent to that on graphs of bounded-treewidth; refer 
to Appendix A for definitions regarding the treewidth and bounded-treewidth graphs as well as 
bounded-genus graphs. In particular, were we able to give a PTAS for SPCSF on graphs of bounded 
treewidth, we would readily have a PTAS for SPCSF on bounded-genus graphs. In the rest of the 
paper, we focus on planar graphs. All the algorithms and analyses can be extended with minor 
modifications to work for bounded-genus graphs. 

Theorem 1. For any given constant e > 0, an a -approximation algorithm for SPCSF on graphs of 
bounded treewidth gives a (a + e)- approximation algorithm for SPCSF on planar graphs. 

The core of the reduction is based on a prize- collecting clustering technique that was first 
implicitly used in [4] and later developed in [12]. In this work, the clustering technique is generalized 
as follows: First, we need to extend the ideas to work for prize-collecting variants of Steiner network 
problems. This can indeed make the problem provably harder; see Theorem 3. The original prize- 
collecting clustering associates a potential value to each node and grows the corresponding clusters 
consuming these potentials. However, in order to extend it to the prize-collecting setting, we 
consider source-sink potentials. This means that there is some interaction between the potentials 
of different nodes. Secondly, we consider submodular penalty functions that model even more 
interaction between the demands. The extended prize-collecting clustering procedure has two 
phases. In the first phase, we have a source-sink moat-growing algorithm, and in the second phase, 
we have a single- node potential moat-growing like [12]. 

Section 3 is devoted to the formal proof Theorem 1. The algorithm starts with a constant- 
approximate solution F , say, obtained using Hajiaghayi et al. [39] who prove a 3-approximation 
for SPCSF on general graphs. The forest F 1 satisfies a subset of demands, and we know the total 
penalty of unsatisfied demands is bounded. The algorithm then tries to satisfy more demands by 
constructing a forest F 2 D F 1 whose length is bounded; see RestrictDemands in Section 3.2. 
This step heavily uses a submodular prize- collecting clustering algorithm 5 introduced in Section 3.1. 
At the end of this step, we can assume that the near-optimal solution does not satisfy the demands 
which are unsatisfied in F 2 . Submodularity poses several difficulties in proving this property: 
ideally, we want to say that the cost paid by the optimal solution to satisfy these demands is 
significantly more than their penalty value. Surprisingly, this is not true. Nevertheless, we can 
prove that the marginal cost of the demands satisfied in the near-optimal solution but not in F 2 can 
be charged to the cost the near-optimal solution pays in order to satisfy them. The next step of the 
reduction is to build a forest -F 3 D F 2 of bounded length that may connect several components of F 2 

4 The problems may be more naturally denned with single-vertex demands rather demand pairs; having such a 
formulation, we can guess one vertex of the solution, designate it as the root and obtain the rooted formulation as 
defined in this paper. 

The algorithm bears some similarity to the primal-dual moat-growing algorithms for the Steiner network problems. 
One key difference is that we do not have a primal LP. We have an LP similar to the dual linear programs used in 
such algorithms, and we use a notion of potential as a substitute for the lack of the primal LP. The potentials, among 
other things, play the role of an upper bound for the value of the dual LP. 
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together; see Section 3.3. This is done by assigning to each component of F 2 a potential proportional 
to its length, and then running a prize-collecting clustering similar to that of [12]. This guarantees 
that the near-optimal solution does not need to connect different components of F 3 to each other. 
The implication is that we can construct a spanner (see [12, 15, 47]) out of each component of 
F 3 separately from the others. In the previous work [12], we could solve each of the subinstances 
independently, however, the penalty interaction originating from the submodular penalty function 
in the current work does not allow us to solve each subinstance completely independently. Instead, 
we say that the forest of the near-optimal solution on each subinstance is independent of the others. 
After constructing the spanner graph F 4 , we invoke a generalization of the shifting idea of Baker [9] 
due to [25, 47]. Paying a cost of at most e OPT, we end up with a graph of bounded treewidth. 

Since bounded-treewidth graphs bear some similarity to trees, several tools have been developed 
for solving optimization problems on them. Standard techniques, see Appendix B, allow us to obtain 
PTASs for several Steiner network problems on graphs of bounded treewidth. 

Theorem 2. PCST, PCS and PCTSP admit PTASs on bounded-treewidth graphs. 

In Section 4 we show how this results in PTASs for the above problems on planar graphs. In 
particular, this is simple for PCST since it is a special case of SPCSF. For the other two problems, 
however, refer to the discussion in Section 4. 

In contrast, we show Prize-Collecting Steiner Forest is APX-hard, even on planar graphs of 
treewidth at least two; Hajiaghayi and Jain show the problem can be solved in polynomial on tree 
metrics [38]. 

Theorem 3. PCSF is APX-hard on (1) planar graphs of treewidth two and on (2) the two- 
dimensional Euclidean metric. 

This is done via a reduction from Bounded- Degree Vertex Cover in Appendix 5. Indeed, the 
result shows that Submodular Prize-Collecting Steiner Tree (the version of the problem when the 
solution has to be a connected tree instead of a forest) is also APX-hard. This implies the hardness 
of PCSF originates from the interaction between the penalties of terminals rather than from the 
different components of the solution. 

Surprisingly, the hardness also works for Euclidean metrics, answering an open question raised 
in [11]. This is a very rare instance where a natural network optimization problem is APX-hard on 
the two-dimensional Euclidean plane. 

Theorem 3 means that planar PCSF reaches a level of complexity where even though reduction 
to bounded treewidth instances works, it does not give us a PTAS for the problem (in fact, no 
PTAS exists unless P = NP). However, the treewidth reduction approach can be still useful for 
obtaining constant factor approximations for planar graphs better than the factor 2.54 algorithm of 
[38] for general graphs. Theorem 1 show that beating the 2.54 factor on bounded treewidth graphs 
would immediately imply the same for planar graphs. We pose it as an open question whether this 
is indeed possible for PCSF. 

3 Reduction to bounded-treewidth case 

This section focuses on proving Theorem 1. In fact, we prove a stronger version of the theorem, 
that is necessary for obtaining PTASs for PCST, PCTSP, and PCS. We reduce an instance (G, T>, ir) 
of SPCSF to an instance (H,T>, tt') where H has bounded treewidth and tt' has a structure similar 



6 



to vr; in particular, for some £> unsat C V we define ir'(D) := tt(D U £> unsat ) for all CCD. Notice 
that if 7r is submodular, then so is ir' . Moreover, if it models a PCSF instance, i.e., ir is an additive 
function, then 7v'(D) — 7r'(0) models a PCSF instance, too. In fact, tt'(D) is an additive function that 
is shifted with a fixed amount ^'(0). Same condition holds for PCST, PCTSP and PCS. Therefore, 
after reducing a PCST instance, we are left with a PCST instance — rather than an SPCSF one — on 
a bounded-treewidth graph. 
The proof has three steps: 

1. We start with an instance (G,V,ir) of SPCSF. We first take out a subset, say D unsat 7 of 
demands whose cost of satisfying is too much compared to their penalties. Thus, we can 
focus on the remaining demands, say T> sat := T>\ p unsat . 

2. Afterwards, we partition the remaining demands T> sat into T>i,T>2, ■ ■ ■ ,T> P such that, roughly 
speaking, SPCSF can be solved separately on each of the demand sets without increasing the 
total cost substantially. 

3. Finally, we build a spanner for each demand set T>i, and use similar ideas as in [12] to reduce 
the problem to bounded-treewidth graphs. 

The first step is carried out in the following theorem. The proof appears in Section 3.2, and 
uses a submodular prize-collecting clustering technique introduced in Section 3.1. This step allows 
us to focus on only a subset T> sat of demands, and ignore the rest of the demands. The additional 
cost due to this is only eOPT. 

Theorem 4. Given an instance (G,T>, tt) of SPCSF (or SPCTSP or SPCS) and a parameter e > 0, 
we can construct in polynomial time a subgraph F of G, satisfying only a subset T> sat C T> of 
demands, in effect leaving D unsat ■=J)\ V sat unsatisfied, such that 

1. length(F) < (6c- 1 + 3)OPT ; and 

2. the optimum of (G, £> sat , vr') is at most (1 + e)OPT where n'(D) := tt(D U £> unsat ) is defined 
forD<Z V sat . 

At this point, we have a constant-approximate solution satisfying all the (remaining) demands. 
The second step is a generalization and extension of the work in [12]. We are trying to break the 
instance into smaller pieces. The solution to each piece is almost independent of the others, i.e., 
there is little interaction between them. The following theorem is proved in Section 3.3. 

Theorem 5. Given are an instance (G,T>,tt) of SPCSF, a forest F satisfying all the demands, and 
a parameter e > 0. We can compute in polynomial time a set of trees {Xi, . . . , T^}, and a partition 
of demands {T>\, . . . , T>k}, with the following properties. 

1. All the demands are covered, i.e., T> = IJiLi^V 

2. The tree Ti spans all the terminals in T>i. 

3. The total length of the trees Ti is within a constant factor of the length of F, i.e., Y2i=i ' en gth(Tj) < 
(f + l)length(F). 
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4- Let T>* be the subset of demands satisfied by OPT. Define T>* := T>* n T>i, and denote by 
SteinerForest(G, T>) the length of a minimum Steiner forest of G satisfying the demands T>. 
We have ]T\ SteinerForest(G, V*) < (1 + e)SteinerForest(G, £>*). 



The final step is very similar to the spanner construction of [12, 15]. Since it has been extensively 
covered in those works, we defer the details to the full version of the paper. 

Now we show how the above theorems imply the main theorem of the paper. 

Proof of Theorem 1. Start with an instance (G, T>, tt) of SPCSF. Without loss of generality we 
present an approximation guarantee of a + 0(l)e. Find F, D S3t and 2? unsat from applying The- 
orem 4 on (G,V,tt). We know that F satisfies V sat and length(F) = O(OPT). Moreover, 
OPTx)sat(G) < OPT. Define n+(D) := 7r(D U P unsat ) for all D C V. Clearly the optimal so- 
lution of (G,V sat ,ir + ) costs no more than (1 + e)OPT. Pick e' < e • length(F)/OPT and feed 
(G, D S3t , 7r + ) along with F and e' to Theorem 5, in order to obtain Pj's and Tj's for i = 1, . . . , k. 
We have J2 i length(Ti) = 0(length(F)) = O(OPT) since e' is a constant. In addition, the theorem 
guarantees a near-optimal solution OPT + of cost at most (1 + 2e)0PT that does not use the con- 
nectivitiy of different components T>i and T>i> for i,i' G {1, . . . ,k} : i ^ i' . This ensures that the 
spanner construction gives us a graph G + (of total length O(OPT)) that approximate the forest 
of the solution within a 1 + e factor. Thus, the optimal solution of (G + , T> sat , tt + ) costs at most 
(l + e)(l + 2e)OPT = [1 + 0(l)e]OPT. Since the total length of the graph G + is within O(OPT), we 
can use the decomposition theorem of [25] to reduce the problem to bounded-treewidth graphs with 
an increase of eOPT in the solution cost. The reduced instance is solved via the a-approximation 
algorithm, and we finally get an approximation ratio of a + 0(e). □ 



3.1 Submodular prize-collecting clustering 

First we present and analyze a primal-dual algorithm for SPCSF, and later we see how this algorithm 
can be used to achieve the goal of identifying and removing certain demands from the optimal 
solution such that the additional penalty is negligible. 

Consider an instance (G(V, E),T>,tt) of the SPCSF. A set S Q V is said to cut a demand 
d = {s, t} if and only if \S D d\ = 1. We denote this by the short-hand dQ S, and say the demand 
d crosses the set S. In the linear program (l)-(3), there is a variable ys,d for any S C V, d G V 
such that dQ S. Conveniently, we use the short-hands ys ■= ]Cdex> V S4 and yd := ^2 SCV ys,d- 

VS<c e VeeE (1) 

S:ee8(S) 

Y, Vd < AD) VDCV (2) 

d&D 

ys,d > Vd G V, S C V, d S. (3) 

We produce a solution to the above LP. Theorem 4 is proved via some properties of this 
solution. These constraints look like the dual of a natural linear program for SPCSF. For the 
sake of convenience, we use the notation y(D) := J2 deD yd for any D C V. 

Lemma 6. Given an instance (G,T>,tt) of SPCSF, we produce in polynomial time a forest F and 
a subset D unsat CD of demands, along with a feasible vector y for the above LP such that 
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1. y(V u " sat ) = vr(P unsat ); 

2. F satisfies any demand in D sat := T>\ £) unsat ; an d 

3. length(F) < 2y(V). 

The solution is built up in two stages. First we perform an submodular growth to find a forest 
F\ and a corresponding y vector. This is different from the usual growth phase of [35, 1] in that the 
penalty function may go tight for a set of vertices that are not currently connected. In the second 
stage, we prune some edges of F\ to obtain another forest Fi. Below we describe the two phases 
of Algorithm 1 (Submodular-PC-Clustering). 



Growth We begin with a zero vector y, and an empty set F\. A demand d G T> is said to be live if 
and only if x(D) < tt(D) for any D CD that d G D. If a demand is not live, it is dead. During the 
execution of the algorithm Submodular-PC-Clustering, we maintain a partition C of vertices 
V into clusters; it initially consists of singleton sets. Each cluster is either active or inactive; the 
cluster C G C is active if and only if there is a live demand d : d Q C. We simultaneously grow all 
the active clusters by n. In particular, if there are k(C) > live demands crossing an active cluster 
C, we increase yc t d by rj/n{C) for each live demand d : dQC. Hence, yc is increased by r\ for every 
active cluster C. We pick the largest value for rj that does not violate any of the constraints in (1) 
or (2). Obviously, r\ is finite in each iteration because the values of these variables cannot be larger 
than tt(T>). Hence, at least one such constraint goes tight after each growth step. If this happens 
for an edge constraint for e = (u, v), then there are two clusters C u 3 u and C v 3 v in C, at least 
one of which is growing. We merge the two clusters into C = C u U C v by adding the edge e to Fi, 
remove the old clusters and add the new one to C. Nothing needs to be done if a constraint (2) 
becomes tight. The number of iterations is at most 2|V| because at each event either a demand 
dies, or the size of C decreases. 

Computing rj is nontrivial here. In particular, we have to solve an auxiliary linear program to 
find its value. New variables y* Sd denote the value of vector y after a growth of size n. All the 
constraints are written for the new variables. There are exponentially many constraints in this LP, 
however, it admits a separation oracle and thus can be optimized. 6 



maximize 
subject to 



vh = vs4 + ^ 
y*s,d = ys,d 

y*s< c e 

S:ee8(S) 
d&D 

y* S A > o 



VdG V,S C V,dQS,K(S) > 
\/d G V, S C V, d S, k(S) = 

Ve G E 

VDCP 
Vd G V, S C V, d S. 



(4) 
(5) 

(6) 
(7) 

(8) 
(9) 



Notice that there are only a polynomial number of non-zero variables at each step since ys,d may be non-zero only 
for clusters S, and these clusters form a laminar family in our algorithm. Verifying constraints (5)-(7) and (9) is very 
simple. Verifying constraints (8) is equivalent to finding min£>cx> k(D) — y*{D) and checking that it is non-negative. 
The function to minimize is submodular and thus can be minimized in polynomial time [43]. A standard argument 
shows that the values of these variables have polynomial size. We defer to the full version of the paper the detailed 
discussion of how the LP can be approximated. 



9 



Pruning Let S denote the set of all clusters formed during the execution of the growth step. It 
can be easily observed that the clusters S are laminar and the maximal clusters are the clusters of 
C. In addition, notice that Fi[C] is connected for each C G S. 

Let £> C 5 be the set of all clusters C that do not cut any live demand. Notice that a demand 
d may still be live at the end of the growth stage if it is satisfied; roughly speaking, the demand is 
satisfied before it exhausts its potential. In the pruning stage, we iteratively remove edges from F\ 
to obtain F 2 . More specifically, we first initialize F 2 with F\. Then, as long as there is a cluster 
S G B such that F 2 n S(S) = {e}, we remove the edge e from F 2 . 

A cluster C is called a pruned cluster if it is pruned in the second stage in which case, 5(C) nF 2 = 
0. Hence, a pruned cluster cannot have non-empty and proper intersection with a connected 
component of F 2 . 

Algorithm 1 Submodular-PC-Clustering 

Input: Instance (G(V, E), V, n) of Generalized prize-collecting Steiner forest 
Output: Forest F, subset of demands £> unsat and fractional solution y. 

1: Let Ft <- 0. 

2: Let ys4 <- for any d G V, S C V, d S. 
3: Let S <- {{v} : v G V*}. 
4: while there is a live demand do 

5: Compute r\ via LP (4): the largest possible value such that simultaneously increasing yc by 
r\ for all active clusters C G C does not violate Constraints (l)-(3). 

6: Let yc,d Uc,d + for all live demands d crossing clusters C G C, i.e., dQ C. 

7: if Be G E that is tight and connects two clusters C\ and C2 then 

8: Pick one such edge e = (u, v). 

9: Let F a <- Fi U {e}. 
10: LetC^CiUC 2 . 
11: Let C^CU{C}\{Cx,C 2 }. 
12: Let5^5u{C}. 
13: Let F 2 Fl. 

14: Let ,6 be the set of all clusters S G S that do not cut any live demands. 
15: while 3S G B such that F 2 n 5(5) = {e} for an edge e do 
16: Let F 2 <- F 2 \ {e}. 

17: Let p unsat denote the set of dead demands. 
18: Output F := F 2 , £> unsat and y. 



We first bound the length of the forest F. The following lemma is similar to the analysis of the 
algorithm in [35]. However, we do not have a primal LP to give a bound on the dual. Rather, the 
upper bound for the length is ir(T>). In addition, we bound the cost of a forest F that may have 
more than one connected component, whereas the prize-collecting Steiner tree algorithm of [35] 
finds a connected graph at the end. 

Lemma 7. The cost o/F 2 is at most 2y(V). 

Proof. Recall that the growth phase has several events corresponding to an edge or set constraint 
going tight. We first break apart y variables by epoch. Let tj be the time at which the j th event 
point occurs in the growth phase (0 = to < t\ < *2 < • • • ), so the j th epoch is the interval of 
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time from tj—i to tj. For each cluster C, let y^ be the amount by which yo grew during epoch 

j, which is tj — tj-i if it was active during this epoch, and zero otherwise. Thus, yo = Y2jVc ■ 
Because each edge e of F2 was added at some point by the growth stage when its edge packing 
constraint (1) became tight, we can exactly apportion the cost c e amongst the collection of clusters 
{C : e G 5(C)} whose variables "pay for" the edge, and can divide this up further by epoch. In 
other words, c e = Y^j J2c-e&S(c) Vc m ^ e wm now P rove that the total edge cost from F2 that is 
apportioned to epoch j is at most 2 J2c V^P • ^ n °th er words, during each epoch, the total rate at 
which edges of F% are paid for by all active clusters is at most twice the number of active clusters. 
Summing over the epochs yields the desired conclusion. 

We now analyze an arbitrary epoch j. Let Cj denote the set of clusters that existed during 
epoch j. Consider the graph F%, and then collapse each cluster C £ Cj into a supernode. Call the 
resulting graph H. Although the nodes of H are identified with clusters in Cj, we will continue to 
refer to them as clusters, in order to to avoid confusion with the nodes of the original graph. Some 
of the clusters are active and some may be inactive. Let us denote the active and inactive clusters 
in Cj by C ac t and Cdead, respectively. The edges of F2 that are being partially paid for during epoch 
j are exactly those edges of H that are incident to an active cluster, and the total amount of these 
edges that is paid off during epoch j is (tj — ^ CgCact deg^(C). Since every active cluster 

grows by exactly tj — tj-i in epoch j, we have YlcVc — YlceC Vc = ~~ tj-i)\Cact\- Thus, it 
suffices to show that YceC deg^(C) < 2|C ac4 |. 

First we must make some simple observations about H. Since F2 is a subset of the edges in 
F\, and each cluster represents a disjoint induced connected subtree of F\, the contraction to H 
introduces no cycles. Thus, H is a forest. All the leaves of H must be live clusters because otherwise 
the corresponding cluster C would be in B and hence would have been pruned away. 

With this information about H, it is easy to bound YceC ^ e §H(C)- The total degree in 
H is at most 2(|C ac t| + |Qeod|)- Noticing that the degree of dead clusters is at least two, we get 
J2ceCact de &n( c ) < 2 (\Cact\ + \Cdead\) ~ 2\Cdead\ = 2|C oct | as desired. □ 

Now we can prove Lemma 6 that characterizes the output of Submodular-PC-Clustering. 

Proof of Lemma 6. For every demand d 6 p unsat we have a set D 3 d such that y(D) = tt(D). The 
definition of 2? unsat guarantees D C p unsat . Therefore, we have sets Di,D%, . . . ,Di that are all tight 
(i.e., y(Di) = vr(A)) and they span £> unsat (i.e., £> unsat = UA)- To prove y(V uns3t ) = vr(P unsat ), we 
use induction and combine D^s two at a time. For any two tight sets A and B we have y(AL)B) = 
y(A) + y(B) - y(A n B) = ir(A) + ir(B) - y(A n B) > ir(A) + ir(B) - tt(A n B) > ir(A U B), where 
the second equation follows from tightness of A and B, the third step is a result of Constraint (2), 
and the last step follows from submodularity. Constraint (2) has it that tt(A U B) > y(A U B), 
therefore, it has to hold with equality. 

Clearly, at the end of execution of Submodular-PC-Clustering, any live demand is already 
satisfied. Notice that such demands are not affected in the pruning stage. Hence, only dead 
demands may be not satisfied. This guarantees the second condition. The third condition follows 
from Lemma 7. □ 
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3.2 Restricting the demands 

We prove Theorem 4 in this section. First, we obtain a constant-factor approximate solution F + — 
this can be done, e.g., via the 3-approximation algorithm for general graphs [39]. Let T> + denote 
the demands satisfied by F + . We denote by Tj~ the connected components of F + . For each demand 
d = {s,t} G T> + we clearly have {s,t} C V(Tj) for some j. However, for an unsatisfied demand 
d! = {s',t'} ET>\ the vertices s' and t' belong to two different components of F + . Construct 
G* from G by reducing the length of edges of F + to zero. The new penalty function tt* is defined 
as follows: 

7r*(D) := e~\(D) for DCP, (10) 

Finally we run Submodular-PC-Clustering on (G*,V, tt*); see Algorithm 2. 
Algorithm 2 Restrict-Demands 

Input: Instance (G,V,ir) of Submodular Prize-Collecting Steiner Forest 
Output: Forest F and £> unsat . 

1: Use the algorithm of Hajiaghayi et al. [39] to find a 3-approximate solution: a forest F + 

satisfying subset V + of demands. 
2: Construct G*(V,E*) in which E* is the same as E except that the edges of F + have length 

zero in E*. 
3: Define tt* as Equation (10). 

4: Call Submodular-PC-Clustering on (G*,V,ir*) to obtain the result F, D unsat and y. 
5: Output F and £> unsat . 



Now we show that the algorithm Restrict-Demands outlined above satisfies the requirements 
of Theorem 4. Before doing so, we show how the cost of a forest can be compared to the values of 
the output vector y. 

Lemma 8. If a graph F satisfies a set T> S3t of demands, then length(i ? ) > ^de2? sat Vd- 

This is quite intuitive. Recall that the y variables color the edges of the graph. Consider a 
segment on edges corresponding to cluster S with color d. At least one edge of F passes through 
the cut (S,S). Thus, a portion of the cost of F can be charged to ys,d- Hence, the total cost of 
the graph F is at least as large as the total amount of colors paid for by T> sat . We now provide a 
formal proof. 
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Proof. The length of the graph F is 

E Ce ^E X ys Mi) 

e&F eeFS:eS5(S) 

= ]T|Fn5(S)|ys 

s 

^ J2 ys 

S:FnS(S)^$ 

= E E y s > d 

S:FnS(S)^<D d-.des 

= Y1 y^ d 

d S:dQS 
Fn<5(S)^0 

- S X] y-v 

Fn5(S)^0 

= 12 ysj' 

deV^ S:d&S 

because y s , d = if d 6 £> sat and F n 5(5) = 0, 

dex> sat 

Proof of Theorem 4- We know that length(F + ) + 7r(2? \ "D + ) < 30PT because we start with a 3- 
approximate solution. For any demand d = (s, i), we know that y d is not more than the distance of 
s,t in G* . Since distance between endpoints of d is zero if it is satisfied in T> + , y d is non-zero only 
if d e T>\T> + , we have y(D) = y(V\V + ) < tt*(T>\V + ) by constraint (2). Lemma 6 gives length(F) 
in G*, denoted by lengthy (F), is at most 2y(V) < 2tt*{V \ V+) = 2e- l n{V \ V+) < Ge^OPT. 
Therefore, length(F) = length(F+) + length G .(F) < + 3)0PT. 

To establish the second condition of the theorem, take an optimal forest F': F' satisfies demands 
V 0PT , and we have length(F') + tt(V\V opt ) = OPT. Define A := V 0PT \V sat and B := V unsat \A. 
The penalty of F' under vr' is ir((V \ P 0PT ) U V unsat ) = vr((£> sat \ V 0PT ) UAuB). Hence, the 
increase m penalty of F' due to changing from vr to vr' is vr((P sat \ V 0PT ) U A U B) - ir((V sat \ 
P 0PT ) U B) < 7t(j4 U B) — tt(B) due to the decreasing marginal cost property of submodular 
functions. We have y(AuB) = tt*(AuB) = e" 1 vr(^UB) because AU B = V unsat is the set of dead 
demands of Submodular-PC-Clustering; see the first condition of Lemma 6. We also have 
e _1 7r(i?) = tt*(B) > y(B) because of Constraint (2). Therefore, the additional penalty is at most 
e[y(A U B) — y(B)] = ey(A). Since F' satisfies the demands A, we have y(A) < length(F') < OPT 
from Lemma 8. Therefore, the additional penalty is at most eOPT. 

The extension to SPCTSP and SPCS is straight-forward once we observe that the cost of building 
a tour or a stroll on a subset S of vertices is at least the cost of constructing a Steiner tree on the 
same set. Hence, there algorithm pretends it has an SPCST instance, and restricts the demand 
set accordingly. However, the extra penalty due to the ignored demands 2? unsat is charged to the 
Steiner tree cost which is no more than the TSP or stroll length. □ 
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3.3 Restricting the connectivity 

We first run Restrict-Demands on (G,V,tt). Let F and T> unsat be its output. The forest F 
satisfies all the demands in D sat := D \ p unsat . The length of this forest is O(OPT) and the 
demands in p unsat can be safely ignored. 

The forest F consists of tree components Tj. In the following, we connect some of these com- 
ponents to make the trees T{. It is easy to see that this construction guarantees the first two 
conditions of Theorem 5. We work on a graph G*(V* , E*) formed from G by contracting each tree 
component of F. A potential (j) v is associated with each vertex v of G*, which is e _1 times the 
length of the tree component corresponding to v in case v is the contraction of a tree component, 
and zero otherwise. 

We use the algorithm PC-Clustering introduced in [12] to cluster the components T and 
construct a forest F2 with components Tf, the details of the algorithm can be seen in [12]. We 
obtain the folowing guarantees. 

We first show the cost of the new edges is small. 

Lemma 9 ([12, Lemma 6]). The cost of F2 is at most ^Ylvev* ( t >v - 

Recall that the trees T are contracted in F2. Construct F from F2 by uncontracting all these 
trees. Let F consist of tree components Tj. It is not difficult to verify that F is indeed a forest, 
but we do not need this condition since we can always remove cycles to find a forest. Define 
T>i := {(s,t) 6 V : s,t G V(Tj)}, and let V* be the subset of demands satisfied by OPT. Define 
V* := V* n T>i, and denote by SteinerForest(C7, V) the length of a minimum Steiner forest of G 
satisfying the demands T>. 

Lemma 10 ([12, Lemma 10]). ^ Steiner Forest (G, T>\ ) < (1 + e)SteinerForest(G, £>*). 

Now, we are ready to prove the main theorem of this section. 

Proof of Theorem 5. The first condition of the lemma follows directly from our construction: we 
start with a solution, and never disconnect one of the tree components in the process. The con- 
struction immediately implies the second condition. By Lemma 9, the cost of F2 is at most 
2 J2vev — I length (F). Thus, F costs no more than (2/e+l) length (F), giving the third condition. 
Finally, Lemma 10 establishes the last condition. □ 

4 PTASs for PCST, PCTSP and PCS on planar graphs 

Since PCST is a special case of PCSF, Theorems 1 and 2 imply that PCST admits a PTAS on 
planar graphs. However, obtaining the same result for PCTSP and PCS is not immediate from 
those theorems since the latter problems are not special cases of PCSF. Here we explain how we 
can use these theorems to obtain the desired PTASs. Here we focus on PCTSP, however, the same 
arguments with minor changes apply to PCS as well. 

Take an instance I = (G, T>, it) of PCTSP, and apply Theorem 4 on X to obtain F and 2? unsat . 
Since all the demands share a common root vertex 7 , all the terminals in T> S3t are connected in 
F. We then invoke the TSP spanner construction of Arora et al. [6] to build H. Finally, we use 

7 If we have a penalty for each vertex in the PCTSP formulation, we can guess a root vertex r and define the 
demand pairs accordingly. 
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the contraction decomposition theorem of Demaine et al. [25] to contract a small-weight subset of 
edges and reduce the problem to graphs of bounded treewidth. The total additional charge due to 
penalties of p unsat and contracted edges is at most O(e)0PT. Therefore, we can obtain a PTAS by 
solving the bounded-treewidth instance precisely. 

5 Hardness of PCSF on series-parallel graphs 

We first present the hardness proof for PCSF on a planar graph of treewidth two. The proof shows 
hardness for a very restricted class of graphs: short cycles going through a single central vertex. 

Proof of Theorem 3( 1 ). We reduce an instance X of Vertex Cover on 3-regular graphs to an instance 
X' of PCSF on a planar graphs of treewidth two. The former is known to be APX-hard [3]. The 
instance X is defined by an undirected graph G. If n denotes the number of vertices of G, the 
number edges is m = 3n/2. We will denote the i-th. vertex of G by Vi, the j-th edge by ej, and the 
first and second endpoints of e,- by and ef \ respectively. 

We now specify the reduction (illustrated in Figure 1); X is represented by (H, T>,n). The 
graph H consists of the vertices 

• a>i for 1 < i < n, 

• bj, cj, for 1 < j < rn, 

• central vertex w, 
and the edges 

• {w, en} of cost 2 (1 < i < n), 

• i w > c j}, i w > c )}i { c pbj}, {cj, bj} of cost 1 (1 < j < m). 
The instance contains the following demands: 

• {w, bj} with penalty 3 (1 < j < m), 

• If Vi = e^p for some 1 < i < n, 1 < j < rn, and I G {1, 2}, then {aj, c^} is a demand with 
penalty 1. 

Thus the number of demands is exactly m + 3n and each a\ appears in exactly 3 demands. We 
claim that the cost of the optimum solution of X' is exactly 2m + 2n + t(G), where t(G) is the size 
of the minimum vertex cover in G. Note that t(G) > n/3 (as G is 3-regular), thus 2m + 2n + t(G) 
is at most a constant times t{G). In order to prove the correctness of the reduction, we prove the 
following two statements: 

(1) Given a vertex cover of size k for G, a solution of cost 2m + 2n + k can be constructed. 

(2) Given a solution of cost at most 2m+2n+A;, a vertex cover of size at most k can be constructed. 
To prove (1), suppose that C is a vertex cover of size k for G. Let T be a tree of H that contains 

• edge {w, ai} if and only if Vi g" C, 
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Figure 1: Illustrating the reduction from 3-Regular Vertex Cover to PCSF. 

• edges {w, cj}, {cj, bj} if and only if ej (7, 

• edges {w, cj}, {c|, bj} if and only if ej G C. 

The total cost of T is 2(n — k) + 2m. Observe that all the demands {w, bj} are connected (either 
via cj or cj). Furthermore, if Vi G" C, then all three demands where dj appears are satisfied: edge 
{w,di} is in T and if V{ = ej, then edge {w,cj} is in T as well. (Note that if Vi = ej and Vi G" C, 
then G C must hold, and therefore {w,cj} is in T.) Thus the total penalty is at most 3k, and 
hence the cost of the solution is at most 2n + 2m + k, as claimed. 

To prove (2), suppose that subgraph F of G is a solution such that the sum of the cost of F 
and the penalties is at most 2m + 2n + k. We can assume that for every 1 < i < n, vertex bj can be 
reached from w: otherwise we can decrease the penalty by 3 at the cost of adding two edges of cost 
1. Furthermore, we can assume that only one of cj and is can be reached from w: otherwise we 
can remove an edge without disconnecting bj from w, thus the cost decreases by 1 and the penalty 
increases by at most 1. Finally, we can assume that if {w,ai} £ F, then all 3 demands containing 
Oj are connected: otherwise removing {w, a^} decreases the cost by 2 and increases the penalty by 
at most 2. 

Let vertex Vi be in C if and only if {w, a{} F. We claim that C is a vertex cover of size at 
most k. To see that C is a vertex cover, consider an edge ej. We have observed above that one 
of Cj and c| cannot be reached from w. If cj cannot be reached from w and ej 1 = Vi, then the 
demand {t>i,cj} is not connected by F. Therefore, not all 3 demands containing a% are connected, 
which means (as observed above) that {w,ai} g" F. Thus vi £ C, covering the edge ej. 

Since every bj can be reached from w and {w, ai} G F if vi C, the cost of F is at least 
2m + 2{n — |C|). Furthermore, if V{ G C, then {w, a{} F, which means that we have to pay 
the penalty for the 3 demands containing Oj. Therefore, the total cost of the solution is at least 
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2m + 2n + |C|. We assumed that the cost of the solution is at most 2m + 2n + |C|, thus |C| < A; 
follows, what we had to prove. □ 

The proof for the Euclidean version is very similar to the graph version. The main difference is 
that the central vertex w is replaced by a set of points arranged along a long vertical path. 

Proof of Theorem 3(2). We reduce an instance X of Vertex Cover on 3-regular graphs to an instance 
X' of PCSF on points in the Euclidean plane. If n denotes the number of vertices of the 3-regular 
graph G in I, then the number edges is m = 3n/2. We will denote the i-th. vertex of G by Vi, the 
j-th edge by ej, and the first and second endpoints of ej by e^ and e^\ respectively. 

We now specify the reduction (illustrated in Figure 2). Let us define U := 10000(ra + m) ("basic 
unit of cost"), H = 10[/ ("horizontal length"), and V = 100C/ ("vertical spacing"). Instance X 1 
contains the following set P of points: 

= (0, y) for every —mV < y < nV, 

= (x, y) and for every < x < H and y = iV for 1 < i < n, 
= (x, y) and z x ,y+w for every < x < H and y = —jV for 1 < j < m, 
(H + 2U, iV) for 1 < % < n, 
(H, -jV + 2U) for 1 < j < m, 

(H, -jV + U), and c| = (H, -jV + 3U) for 1 < j < m. 

Let Z be the set of all z X:V vertices in P, note that \Z\ = V(i + j) + 1 + (i + 2j)H. For ease of 
notation, we define wi = zrav, = zn,~jv, w'j = zn-jv+w- 
The instance contains the following demands: 

1. If z X:V and z x+ i : y are both in P, then there is a demand {z x ^ y , z x+ i jV } with penalty 1. 

2. If z x ^ y and z x , y +i are both in P, then there is a demand {z x , y , z x ^ y +i} with penalty 1. 

3. {(0, 0), bj} with penalty 3U (1 < j < n), 

4. If V{ = for some 1 < i < n, 1 < j < m, and I G {1,2}, then {aj,Cj} is a demand with 
penalty U — 10. 

The total number of demands is \Z\ — 1 + n + 3m and each cij appears in exactly 3 demands. 
We claim that the cost of the optimum solution of X' is between \Z\ + (2m + 2n + t(G))U and 
\Z\ + (2m + 2n + t(G))U — lOOn, where t{G) is the size of the minimum vertex cover in G. Note 
that m = 3n/2 and t(G) > m/3, thus \Z\ + (2m + 2n + t(G))U is at most a constant factor larger 
than t{G)U. 

More precisely, in order to prove the correctness of the reduction, we prove the following two 
statements: 

(1) Given a vertex cover of size k for G, a solution of cost at most \Z\ + (2m + 2n + k)U for X' can 
be constructed. 
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(2) Given a solution of cost at most \Z\ + (2m + 2n + k)U for X', a vertex cover of size at most k 
can be constructed. 

To prove (1), suppose that C is a vertex cover of size k for G. Let F be the forest (actually, a tree) 
that contains 

1. edge {z XjV , z x+ i tV } if both these points are in P, 

2. edge {z x , y , z x , y +i} if both these points are in P, 

3. edge {wi,ai} if Vi C, 

4. edges {wj,cj} and {c],6j} if e~p C, 

5. edges {w^cj} and {Cj,bj} if e~p £ C. 

The total cost of F is \Z\ - 1 + 2f/(n - k) + 2Um. Observe that all the demands {(0, 0), bj} 
are satisfied. Furthermore, if vi C, then all three demands where a* appears are satisfied. This 
can be seen as follows. First, a« is in the same component as iuj and hence as every vertex of 
Z. If Vi = e^p , then there is a demand {oj,cj} and cj is connected with ioj (and hence with Oj). 

If = ej 2 '', then vi C means that e^p G C must hold, and therefore c| is connected to u>|, 
satisfying the demand {ai,cj}. Thus the total penalty is at most 3k(U — 10), and hence the cost 
of the solution is at most \Z\ — 1 + (2m + 2n + k)U — 30k, as claimed. 

To prove (2), suppose that forest F is an optimum solution such that the sum of the cost of F 
and the penalties is at most \Z\ + (2n + 2m + k)U . First, we can assume that every demand of the 
first two types is satisfied: if, say, (z x y , z x +i,j/) is n °t satisfied, then we can extend F by adding an 
edge of cost 1, which decreases the penalty by at least 1. Thus all the z X;V points are in the same 
connected component K of F. We can also assume that every demand of the third type is satisfied: 
if {(0,0), bj} is not satisfied, then we can decrease the penalty by 3U at the cost of 2U by adding 
edges {wj,Cj} and {cj,6j}, contradicting the optimality of F. Therefore, every vertex bj is in the 
component K. 

Let Z' = {z x , y £ Z \ x = 0V x > 10}. Let R be the region of the plane at Manhatten distance 
at most 3 from Z' . Note that R consists of one "vertical" and n + 2m "horizontal" components. 

We claim that the cost of F inside R is at least \Z'\. We have seen above that a single component 
K of F contains every point of P D R. The restriction of K to R gives rise to several components. 
Consider such a component K' containing a subset S C Z 1 of vertices. We show that the cost of 
K' is at least \S\. The vertices of S lie on a horizontal or vertical line. This means that there are 
two vertices s±,S2 £ S at distance d > \S\ — 1. As K is not contained fully in any component of 
R, component K' has to contain a point S3 on the boundary of R. As S3 is at distance at least 3 
from si and S2, it can be verified that any Steiner tree of s\, S2, S3 has cost at least d + 1 = \S\. 
Summing for every component K' of the restriction of K to R, we get that the cost of K in R is 
at least \P n R\. 

Let R + be the region of space at Manhattan distance at most 3 from Z. We claim that the cost 
of every component of F \ R + is at most 3U . There are two types of components of F \ R + : (1) 
those that contain a point of P and (2) those that do not contain such a point. Clearly, there are at 
most n + 3m components of the first type. Suppose that there is a component D of the second type 
having cost more than 3U. In this case, we modify F to obtain a better solution as follows. Consider 
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Figure 2: Illustrating the reduction from 3-Regular Vertex Cover to Euclidean PCSF. 
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F \ R + (i.e., let us remove the part of F inside R + ) and let us remove every component of the 
second type. After that, let us add all the \Z\ — 1 edges of the form {w x ,y,w x +i,y}, {w x ,y,w x , y +i}- 
Finally, for every component of the first type, if it intersects R + , then let us choose a point of the 
component on the boundary of R + and connect this point to the nearest vertex of Z. It is clear that 
the new forest F' satisfies every demand satisfied by F: every point of P connected to Z remains 
connected to Z . By our claim in the previous paragraph, the cost of F\R' is less than the cost 
of F by at least \Z'\ = \Z\ — 9(n + 2m). Removing components of the second type decreases the 
cost by more than 3U (as there are at least one such component having cost more than 3U). The 
edges connecting Z increase the cost by \Z\ — 1. Adding the new connections corresponding to the 
components of the first type increases the cost by at most n + 3m. As 3U > 9(n + 2m) — 1 + n + 3m, 
forest F' is a strictly better solution, a contradiction. 

Suppose now that there is a component D of the first type with cost more than 3U. For 
—m < s < n, let R s be the region of the plane at Manhattan distance at most 4U from (H, sV). 
Observe that for each s, all the points of P D R s can be connected to the nearest point of Z with 
a total cost of at most 3U. This means that if D intersects only one of these regions, say R s , then 
we can substitute D at cost at most 3U in such a way that every demand satisfied by F remains 
satisfied, contradicting the optimality of F. Suppose therefore that D intersects t > 2 of these 
regions; in this case, the cost of D is at least (t — 1)(V — 8U) > QtU — 6U > 3tU. Let us replace D 
by connecting every point of P D D to the closest vertex of Z. The new connections increase the 
cost by at most t ■ 3U, which is less than the cost of D, a contradiction. 

We have proved that for every component D of F \ R + , D n P is either a single cii, or a subset 
of {bj,Cj,Cj}. Therefore, every such component D intersects R + : otherwise, Dcould be safely 
removed, as it does not satisfy any demand. Next we show that it can be asssumed that only one of 
cj and cj is in K. Otherwise we can remove every component of F\R + intersecting {bj,Cj,Cj} and 
replace them with the edges {wj,Cj} and {cj,&j}. The total cost of the components we removed 
is at least 2U — 3 + U — 3 (which is the minimum cost of connecting bj, cj, cj to R + ) and the new 
edges have cost 2U. This transformation might disconnect the demand containing cj, hence the 
penalty can increase by at most U — 10 only, contradicting the optimality of F. 

We can assume that if ai is in K, then all 3 demands containing ai are connected: otherwise 
removing the component of F\R + containing a\ decreases the cost by at least 2U — 3 and increases 
the penalty by at most 2(U — 10). 

Let vertex Vi be in C if and only if a« is not in component K. We claim that C is a vertex cover 
of size at most k. To see that C is a vertex cover, consider an edge ej. We have observed above 
that one of cj and cj is not in K. If cj ^ K and = Vi, then the demand {dj, cj} is not connected 
by F. Therefore, not all 3 demands containing are connected, which means (as observed above) 
that ai is not in K. Thus vi G C, covering the edge Bj. Similarly, cj K, then e*p G C. 

The cost of F n R + is at least \Z\ — 9(n + 2m). Since every bj is in K and a, is in K if v; L C, 
the cost of F \ R + is at least (2U — 3)m + (2U — 3)(n — |C|). Furthermore, if v.- L G C, then we have 
to pay the penalty for the 3 demands containing cij. Therefore, the total cost of the solution is at 
least 



|Z|-9(n + 2m) + (2?7-3)m + (2f7-3)(n-|C|) + 3|C|([/-10) > \Z\ + (2m + 2n + \C\)U - lOOn. 

We assumed that the cost of the solution is at most \Z\ + (2m + 2n + k)U . As U > lOOn, this is 
only possible if \C\ < k, what we had to prove. □ 
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A Basic graph theory definitions 

Let G(V,E) be a graph. As is customary, let S(V') denote the set of edges having one endpoint 
in a subset V C V of vertices. For a subset of vertices V C V , the subgraph of G induced by V' 
is denoted by G[V']. With slight abuse of notation, we sometimes use the edge set to refer to the 
graph itself. Hence, the above-mentioned subgraph may also be referred to by E[V'] for simplicity. 
We denote the length of a shortest x-to-y path in G as d\sta(x, y). For an edge set E, we denote 
by £(E) := Xlee-B Ce ^ e total length of edges in E. 

Given an edge e = (u, v) in a graph G, the contraction of e in G denoted by G/e is the result of 
unifying vertices u and v in G, and removing all loops and multiple edges except the shortest edge. 
More formally, the contracted graph G/e is formed by the replacement of u and v with a single 
vertex such that edges incident to the new vertex are the edges other than e that were incident 
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with u or v. To obtain a simple graph, we first remove all self-loops in the resulting graph. In case 
of multiple edges, we only keep the shortest edge and remove all the rest. The contraction G/E' is 
defined as the result of iteratively contracting all the edges of E' in G, i.e., G/E' := G/e\/&il . . . /ejt 
if E' = {ei, e2, . . . , efc}. Clearly, the planarity of G is preserved after the contraction. Similarly, 
contracting edges does not increase the cost of an optimal Steiner forest. 

The boundary of a face of a planar embedded graph is the set of edges adjacent to the face; it 
does not always form a simple cycle. The boundary dH of a planar embedded graph H is the set 
of edges bounding the infinite face. An edge is strictly enclosed by the boundary of H if the edge 
belongs to H but not to dH. 

Now we define the basic notion of treewidth, as introduced by Robertson and Seymour [50]. To 
define this notion, we consider representing a graph by a tree structure, called a tree decomposition. 
More precisely, a tree decomposition of a graph G(V, E) is a pair (T, B) in which T(I, F) is a tree 
and B = {Bi \ i G 1} is a family of subsets of V(G) such that 1) \J ieI Bi = V; 2) for each edge 
e = (u, v) G E, there exists an i G / such that both u and v belong to Bf, and 3) for every v G V, 
the set of nodes {i G / | v G Bi} forms a connected subtree of T. 

To distinguish between vertices of the original graph G and vertices of T in the tree decom- 
position, we call vertices of T nodes and their corresponding B^s bags. The width of the tree 
decomposition is the maximum size of a bag in B minus 1. The treewidth of a graph G, denoted 
tw(G), is the minimum width over all possible tree decompositions of G. 

For algorithmic purposes, it is convenient to define a restricted form of tree decomposition. We 
say that a tree decomposition (T, B) is nice if the tree T is a rooted tree such that for every i £ I 
either 

1. i has no children [i is a leaf node), 

2. i has exactly two children i%, %2 and Bi = B^ = Bi 2 holds (i is a join node), 

3. i has a single child i' and Bi = Bi> U {v} for some v G V (i is an introduce node), or 

4. i has a single child i! and Bi = Bi> \ {v} for some v G V {i is a forget node). 

It is well-known that every tree decomposition can be transformed into a nice tree decomposition 
of the same width in polynomial time. Furthermore, we can assume that the root bag contains 
only a single vertex. 

We also need a basic notion of embedding; see, e.g., [51, 17]. In this paper, an embedding refers 
to a 2-cell embedding, i.e., a drawing of the vertices and edges of the graph as points and arcs in 
a surface such that every face (connected component obtained after removing edges and vertices 
of the embedded graph) is homeomorphic to an open disk. We use basic terminology and notions 
about embeddings as introduced in [48]. We only consider compact surfaces without boundary. 
Occasionally, we refer to embeddings in the plane, when we actually mean embeddings in the 2- 
sphere. If S is a surface, then for a graph G that is (2-cell) embedded in S with / facial walks, the 
number g = 2 — |1^(G)| + |-E(G)| — / is independent of G and is called the Euler genus of S. The 
Euler genus coincides with the crosscap number if S is non-orientable, and equals twice the usual 
genus if the surface S is orientable. 
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B PCST, PCTSP and PCS on bounded-treewidth graphs 



Treewidth is a notion of how similar a graph is to trees. Since tree structure usually lends itself 
to the dynamic programming approach, it is plausible that many optimization problems may be 
solvable in polynomial time on graphs of bounded treewidth; Bodlaender and Koster [14] have 
a comprehensive survey on this topic. In particular, several Steiner network problems become 
relatively easy when restricted to bounded-treewidth graphs. Among them are Steiner Tree, TSP 
and Stroll. One surprising outlier is Steiner forest that is proved to be NP-hard, yet it admits a 
PTAS [12]. In this section, we study the prize-collecting extensions of the above problems, and 
when possible, we provide a polynomial-time algorithm for them. More specifically, we present 
PTASs for PCST, PCTSP and PCS on bounded-treewidth graphs. We already showed in Section 5 
that PCSF is APX-hard even on series-parallel graphs. The proof is extended to give APX-hardness 
for Euclidean plane. 

We focus the discussion on PCST, however, minor modifications allow us to solve PCTSP and 
PCS, too. We are given a weighted graph G(V, E) of treewidth k — 1 for a fixed parameter k, and 
a penalty function tt : V — > M+. We have a nice tree decomposition (T,B) for G. Each bag Bi has 
size at most k. These are sometimes called portals for the subtree below node B{. Let / denote 
the nodes of the tree decomposition T, and for each i G I, let Tj be the subtree of T below i. A 
dynamic programming entry is specified by a tuple (i,S,V) where 

• i 6 / is a node in the tree decomposition, 

• S C Bi is a subset of portals of the subtree Tj, and 

• V is a partition of S. 

Let us denote by Vi the vertices corresponding to the subtree Tj, i.e., Vi := Uj'g^Pj'. A dynamic 
programming entry DP(z, S,V) takes up the least cost of building a subgraph H such that 

• H uses only the edges whose both endpoints are in Vi, 

• H connects the vertices in each set Pj of the partition V = {Pi, P%, . . . , P m }, 

• S is the subset of Bi whose penalty is not paid, moreover, if a vertex v S Vi is not connected 
to S via H , then its penalty ir(v) is paid in the total cost. 

The final solution to the problem can be found as mins DP(r, S, {S}) where r is the root of the 
tree decomposition, i.e., it does not matter which subset of the bag of the root is picked as long as 
they form a single component. 

The DP entries are easy to compute for leaves: let Bi = {v} for a leaf i. There are two 
possibilities: DP(i,0,0) = ir(v) and DP(i, {v}, {{v}}) = 0. The update procedure works as follows 
for different tree nodes: 

Introduce node i is the parent of i! , and we have Bi = B^ U {v}. Then, DP(i,5', V) = tt(v) + 
DP(i',S, V) if v S. Next consider an entry DP(i, S,V) such that for v G S and V = 
{Pi, P-2, . . . , P m } where v G Pi. Let V := {Pi \ {v}, P2, . . . , P m } and let d be the distance of 
v to the set Pi \ {v}. The dynamic programming sets DP(i, S, V) = d + DP(z', S \ {v},V'). 
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Forget node i is the parent of i', and we have By = B- L U {v}. Then, 



tt(v) + DP(i',S,V), 

min [DP(i', S U {v}, V') : V' is formed by adding v to a set of V} . 

The first terms considers the case where we pay the penalty for v and do not connect it in the 
final Steiner tree, whereas the second term takes into account the case where v is connected 
to each connected component of the partition. 

Join node the node i has two children i\ and %2 with the same bags. We set DP(i, S, V) to 

min {DP(ii, S, V) + DP(i 2 , S, V) - vr(^ \ S)} , 

where the minimization goes over all pairs V\ and V2 whose connectivity implies that of V. 
The last term in the minimum operand is for canceling the double charging of the unsatisfied 
terminals of -£>;. 

It is not difficult to verify that the algorithm produces the correct output, and we defer the 
proof to the full version of the paper. The running time of the algorithm is polynomial in the 
number of DP entries, and the latter is at most n ■ 2 k ■ k k . Since k is a constant, the running time 
is a polynomial. 

To extend the algorithm to PCTSP, the DP state is modified to (i,V) where i G / is a node of 
the tree decomposition, and V is a set of pairs of vertices in bag Bi. A pair s, t implies that there is 
a path between s and t in the subsolution, but the two nodes should be extended from outside the 
subtree Tj to make a tour. The final solution is stored in DP(r,{(r,r)}). The algorithm for PCS 
works in the same way except that the final solution can be founded in mm s j£B r DP(r, {(s, t)}) 
since we do not need to have a closed tour. 



DP(i,S,V) = min 
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